An Integrated Approach to Rating and Filtering Web Content
نویسندگان
چکیده
In this poster, we will illustrate an integrated approach to Web filtering, whose main features are flexible filtering policies taking into account both users’ characteristics and resource content, the specification of an ontology for the filtering domain, and the support for the main filtering strategies currently available. Our approach has been implemented in two prototypes, which address the needs of both home and institutional users, and which enforce filtering strategies more sophisticated and flexible than the ones currently available. Web content filtering concerns the evalutation of Web resources in order to verify whether they satisfy given parameters. Although such definition is quite general, and it applies to diverse applications, Web filtering has been enforced so far mainly in order to protect users (e.g., minors) from possible ‘harmful’ content (e.g., pornography, violence, racism). The filtering systems currently available can be grouped into two main classes. The former adopts a list-based approach, according to which Web sites are classified either as ‘appropriate’ (white lists) or ‘inappropriate’ (black lists). In the latter, Web resources are described by metadata associated with them, which are used for evaluating whether they can be accessed or not, depending on the preferences specified by the end user or a supervisor. Such approach is adopted mainly by the rating systems based on the PICS (Platform for Internet Content Selection) W3C standard [1], which defines a general format for content labels to be associated with Web sites. Both such strategies have been criticized for enforcing a restrictive and rather ineffective filtering. In fact, their classification of Web resources is semantically poor, which does not allow to distinguish between categories concerning similar contents (e.g., pornography and gynecology). For the same reason, they often underand/or over-block the access to the Web—i.e., respectively, they allow users to access inappropriate resources, or they prevent users from accessing appropriate resources. The metadata-based approach should overcome such drawbacks, since it would allow one to specify a precise and unambiguous description M. Ali and F. Esposito (Eds.): IEA/AIE 2005, LNAI 3533, pp. 749–751, 2005. c © Springer-Verlag Berlin Heidelberg 2005 750 E. Bertino et al. of resources, but this is not true for the available metadata-based rating and filtering systems. In order to address the issues of Web content rating and filtering, we developed an integrated approach which, besides supporting both the listand metadata-based strategies, defines content labels providing an accurate description of Web resources and takes into account users’ characteristics in order to enforce flexible filtering policies. The outcome of our work, formerly carried out in the framework of the EU project EUFORBIA, has been two prototypes, addressing the needs of institutional and home users, and an ontology (namely, the EUFORBIA ontology) for the specification of content labels. The EUFORBIA ontology is an extension concerning the pornography, violence, and racism domains, of the general NKRL (Narrative Knowledge Representation Language) ontology [2]. NKRL is used to specify the EUFORBIA content labels, which consist of three sections: the first concerns the aims of the Web site, the second describes its relevant characteristics and content, whereas the third outlines the Web site’s main sections. It is important to note that, differently from the currently available PICS-based content labels, a EUFORBIA label does not rate a Web site only with respect to the contents liable to be filtered, but, since the NKRL ontology is general purpose, it provides a precise and objective description of its content and characteristics. As a result, we can specify policies more sophisticated than, e.g., “user u cannot access pornographic Web sites”, and it is possible to distinguish more precisely between, e.g., an actually pornographic Web site and a Web site addressing sexual topics and contents from a non-pornographic (e.g., medical) point of view. The EUFORBIA ontology and the corresponding labels are used by two filtering prototypes which enforce complementary strategies for addressing end users’ needs. The former prototype, referred to as NKRL-EUFORBIA [3], allows end users to generate and associate EUFORBIA labels with Web resources, and to build a user profile by specifying NKRL-encoded filtering policies. NKRL-EUFORBIA can run either serveror client-side, and it consists of three main modules: the Annotation Manager, which allows the creation of well-formed NKRL ‘conceptual annotations’ to be used for encoding EUFORBIA labels, the Web Browser, which allows the definition of a user profile and a safe navigation over the Internet, and finally the Web Filter, which is used by the Web Browser module in order to determine whether the access to a requested resource must be granted or not. The latter EUFORBIA prototype [3, 4], whose current version is referred to as MFilter [5], is a proxy filter specifically designed for institutional users, who must manage the access to Web content for a high number of heterogeneous users. MFilter implements a model according to which filtering policies can be specified on either users’/resource identity or characteristics. Users are characterized by associating with them ratings, organized into a hierarchy and denoted 1 For detailed information concerning EUFORBIA, we refer the reader to the project Web site: http://semioweb.msh-paris.fr/euforbia An Integrated Approach to Rating and Filtering Web Content 751 by a set, possibly empty, of attributes. An example of user rating system is depicted in Figure 1. Thus, differently from the available filtering systems, which make use of predefined and static profiles, MFilter allows one to specify policies which take into account both user ratings and attribute values (e.g., “all the students whose age is less than 16”). Resources are rated according to the mePERSON +id: int +name: string
منابع مشابه
Use of Semantic Similarity and Web Usage Mining to Alleviate the Drawbacks of User-Based Collaborative Filtering Recommender Systems
One of the most famous methods for recommendation is user-based Collaborative Filtering (CF). This system compares active user’s items rating with historical rating records of other users to find similar users and recommending items which seems interesting to these similar users and have not been rated by the active user. As a way of computing recommendations, the ultimate goal of the user-ba...
متن کاملImage Rating System for Filtering Web Pages with Inappropriate Contents
We have developed a prototype system with image discrimination for the filtering and rating of web pages displaying inappropriate content. We used the SafetyOnline rating standard for the system. The rating standard defines five categories having five levels. The system rates web pages and classifies them into five levels of inappropriateness for each category according to the rating standard. ...
متن کاملPerformance Enhancement in Collaborative Filtering Technique by Removing Shilling Effect
Web page content mining is traditional searching of Web pages with the help of content, while Search results mining is a further search of pages found from a previous search. Web content mining has an approach that is shilling effect in which rating can be done and gives improper results. In this work we proposed an algorithm which gives appropriate values than shilling effects. Keywords, web m...
متن کاملImplementing a Rating-Based Item-to-Item Recommender System in PHP/SQL
User personalization and profiling is key to many succesful Web sites. Consider that there is considerable free content on the Web, but comparatively few tools to help us organize or mine such content for specific purposes. One solution is to ask users to rate resources so that they can help each other find better content: we call this rating-based collaborative filtering. This paper presents a...
متن کاملA survey on Automatic Text Summarization
Text summarization endeavors to produce a summary version of a text, while maintaining the original ideas. The textual content on the web, in particular, is growing at an exponential rate. The ability to decipher through such massive amount of data, in order to extract the useful information, is a major undertaking and requires an automatic mechanism to aid with the extant repository of informa...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005